Self-Driving Car Engineer Nanodegree

Deep Learning

Project: Build a Traffic Sign Recognition Classifier

In this notebook, a template is provided for you to implement your functionality in stages which is required to successfully complete this project. If additional code is required that cannot be included in the notebook, be sure that the Python code is successfully imported and included in your submission, if necessary. Sections that begin with 'Implementation' in the header indicate where you should begin your implementation for your project. Note that some sections of implementation are optional, and will be marked with 'Optional' in the header.

In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation. Each section where you will answer a question is preceded by a 'Question' header. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'. Your project submission will be evaluated based on your answers to each of the questions and the implementation you provide.

Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. In addition, Markdown cells can be edited by typically double-clicking the cell to enter edit mode.


Step 0: Load The Data

In [2]:
import cv2
import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt
from skimage import io
import tensorflow as tf
from tensorflow.contrib.layers import flatten
In [3]:
# Load pickled data
import pickle
print("Loading Data...")
# TODO: Fill this in based on where you saved the training and testing data

training_file = './train.p'
testing_file = './test.p'

with open(training_file, mode='rb') as f:
    train = pickle.load(f)
with open(testing_file, mode='rb') as f:
    test = pickle.load(f)
    
train_features, train_labels = train['features'], train['labels']
test_features, test_labels = test['features'], test['labels']

print("Loading Complete.")
Loading Data...
Loading Complete.

Step 1: Dataset Summary & Exploration

The pickled data is a dictionary with 4 key/value pairs:

  • 'features' is a 4D array containing raw pixel data of the traffic sign images, (num examples, width, height, channels).
  • 'labels' is a 2D array containing the label/class id of the traffic sign. The file signnames.csv contains id -> name mappings for each id.
  • 'sizes' is a list containing tuples, (width, height) representing the the original width and height the image.
  • 'coords' is a list containing tuples, (x1, y1, x2, y2) representing coordinates of a bounding box around the sign in the image. THESE COORDINATES ASSUME THE ORIGINAL IMAGE. THE PICKLED DATA CONTAINS RESIZED VERSIONS (32 by 32) OF THESE IMAGES

Complete the basic data summary below.

In [3]:
### Replace each question mark with the appropriate value.

# TODO: Number of training examples
n_train = len(train_features)

# TODO: Number of testing examples.
n_test = len(test_features)

# TODO: What's the shape of an traffic sign image?
image_shape = train_features[0].shape

# TODO: How many unique classes/labels there are in the dataset.
n_classes = max(train_labels) + 1

print("Number of training examples =", n_train)
print("Number of testing examples =", n_test)
print("Image data shape =", image_shape)
print("Number of classes =", n_classes)
Number of training examples = 39209
Number of testing examples = 12630
Image data shape = (32, 32, 3)
Number of classes = 43

Visualize the German Traffic Signs Dataset using the pickled file(s). This is open ended, suggestions include: plotting traffic sign images, plotting the count of each sign, etc.

The Matplotlib examples and gallery pages are a great resource for doing visualizations in Python.

NOTE: It's recommended you start with something simple first. If you wish to do more, come back to it after you've completed the rest of the sections.

In [4]:
### Data exploration visualization goes here.
### Feel free to use as many code cells as needed.

print("Visualizing Data...")

train_features = np.array(train['features'])
train_labels = np.array(train['labels'])
input_counts = np.bincount(train_labels)
max_input = np.max(input_counts)
min_input = np.min(input_counts)
print ("The maximum inputs per class is:", max_input)
print ("The minimum inputs per class is:", min_input)
print(sum(input_counts))

### Visualize training data ###

figure1 = plt.figure()
a = figure1.add_subplot(111)
a.set_title('Distribution of Training Samples')
a.set_xlabel('Class')
a.set_ylabel('Number of Samples')
a.bar(range(len(input_counts)), input_counts, 1, color='orange')
plt.show()

### Visualize training images with labels ###

for i in range(n_classes):
    for j in range(len(train_labels)):
        if (i == train_labels[j]):
            print('Class: ', i)
            plt.imshow(train_features[j])
            plt.show()
            break

print("Data Visualization Complete.")
Visualizing Data...
The maximum inputs per class is: 2250
The minimum inputs per class is: 210
39209
Class:  0
Class:  1
Class:  2
Class:  3
Class:  4
Class:  5
Class:  6
Class:  7
Class:  8
Class:  9
Class:  10
Class:  11
Class:  12
Class:  13
Class:  14
Class:  15
Class:  16
Class:  17
Class:  18
Class:  19
Class:  20
Class:  21
Class:  22
Class:  23
Class:  24
Class:  25
Class:  26
Class:  27
Class:  28
Class:  29
Class:  30
Class:  31
Class:  32
Class:  33
Class:  34
Class:  35
Class:  36
Class:  37
Class:  38
Class:  39
Class:  40
Class:  41
Class:  42
Data Visualization Complete.

Step 2: Design and Test a Model Architecture

Design and implement a deep learning model that learns to recognize traffic signs. Train and test your model on the German Traffic Sign Dataset.

There are various aspects to consider when thinking about this problem:

  • Neural network architecture
  • Play around preprocessing techniques (normalization, rgb to grayscale, etc)
  • Number of examples per label (some have more than others).
  • Generate fake data.

Here is an example of a published baseline model on this problem. It's not required to be familiar with the approach used in the paper but, it's good practice to try to read papers like these.

NOTE: The LeNet-5 implementation shown in the classroom at the end of the CNN lesson is a solid starting point. You'll have to change the number of classes and possibly the preprocessing, but aside from that it's plug and play!

In [5]:
### Generate data additional data (OPTIONAL!)
### and split the data into training/validation/testing sets here.
### Feel free to use as many code cells as needed.
In [6]:
### Preprocess the data here.
### Feel free to use as many code cells as needed.

### Create additional images to compensate for the uneven class distribution ###

print("Generating additional (rotated) images...")
angles = list(range(10, -10, -1))

for i in range(len(input_counts)):
    input_class_ratio = min(int(max_input / input_counts[i]) - 1, len(angles) - 1)
    if input_class_ratio <= 1:
        continue

    new_train_features = []
    new_train_labels = []
    train_mask1 = np.where(train_labels == i)

    for j in range(input_class_ratio):
        for feature in train_features[train_mask1]:
            new_train_features.append(ndimage.rotate(feature, angles[j], reshape=False))
            new_train_labels.append(i)
                    
    train_features = np.append(train_features, new_train_features, axis=0)
    train_labels = np.append(train_labels, new_train_labels, axis=0)

print('Additional data generation complete.')
print('New training set size with rotated images: ', len(train_features))   
Generating additional (rotated) images...
Additional data generation complete.
New training set size with rotated images:  80337
In [8]:
def class_distribution(y, n_classes):
    labels = np.array(range(0, n_classes))
    n_examples = np.zeros(n_classes, dtype=np.int32)
    for i in labels:
# Get indices from y for training and test sets
        n_examples[i] = sum(y == i)   
    return (labels, n_examples)


### Transformation Functions ###

# Image trasform 1 (squeeze right)
def transform_1(image):
    p1 = np.float32([[0,0],[32,0],[0,32],[32,32]])
    p2 = np.float32([[0,0],[32,5],[0,32],[32,27]])

    orig = cv2.getPerspectiveTransform(p1, p2)
    new = cv2.warpPerspective(image,orig,(32,32))
    return new

# Image trasform 2 (squeeze left)
def transform_2(image):
    p1 = np.float32([[0,0],[32,0],[0,32],[32,32]])
    p2 = np.float32([[0,5],[32,0],[0,27],[32,32]])

    orig = cv2.getPerspectiveTransform(p1, p2)
    new = cv2.warpPerspective(image,orig,(32,32))
    return new

# Image transform 3 (stretch image)
def transform_3(image):
    p1 = np.float32([[0,0],[32,0],[0,32],[32,32]])
    p2 = np.float32([[0,0],[27,5],[5,32],[32,27]])

    orig = cv2.getPerspectiveTransform(p1, p2)
    new = cv2.warpPerspective(image,orig,(32,32))
    return new

# Image transform 4 (stretch image 2)
def transform_4(image):
    p1 = np.float32([[0,0],[32,0],[0,32],[32,32]])
    p2 = np.float32([[5,5],[32,0],[0,27],[27,32]])

    orig = cv2.getPerspectiveTransform(p1, p2)
    new = cv2.warpPerspective(image,orig,(32,32))
    return new

# Use random number to decide which image transformation to use
def new_image(image):
    transform_number = np.random.randint(0, 4)
    
    if transform_number == 0:
        return transform_1(image)    
    elif transform_number == 1:
        return transform_2(image)    
    elif transform_number == 2:
        return transform_3(image)
    else:
        return transform_4(image)        
    
def generate_data(X, y, orig_class_distribution, multiplication_factor = 4):
    X_new = []
    y_new = []
    
# Get minimum final size of new training set, as well as number of images per class
    final_size = multiplication_factor * X.shape[0]
    n_classes = len(orig_class_distribution)
    n_images_per_class = int(np.ceil(final_size / n_classes))
    
    for i in range(X.shape[0]):
# Compute number of required images
        class_idx = y[i]
        current_distribution_index = orig_class_distribution[class_idx]        
        n_new_images = int(np.ceil((n_images_per_class - current_distribution_index) / current_distribution_index))        
        
# Create new images with transformation functions
        for j in range(n_new_images):
            X_new.append(new_image(X[i]))
            y_new.append(y[i])
        
    return (np.array(X_new), np.array(y_new))

def add_transform_data(X, y, orig_class_distribution):
    X_new, y_new = generate_data(X, y, orig_class_distribution)
    
    X = np.concatenate((X, X_new), axis = 0)
    y = np.concatenate((y, y_new), axis = 0)
    
    return (X, y)   
In [9]:
print("Generating additional (transformed) images...")

### Use transformations to create even more data ###

_, old_distribution = class_distribution(train_labels, n_classes)
train_features, train_labels = add_transform_data(train_features, train_labels, old_distribution)

print('Additional data generation complete.')
print('New training set size with transformed and rotated images: ', len(train_features))

### Visualize updated class distribution after adding more data ###

input_counts = np.bincount(train_labels)
figure2 = plt.figure()
ax = figure2.add_subplot(111)
ax.set_title('Number of inputs per class w/ Additional Data')
ax.set_xlabel('Class')
ax.set_ylabel('Number of Inputs')
ax.bar(range(len(input_counts)), input_counts, 1, color='green')
plt.show()
Generating additional (transformed) images...
Additional data generation complete.
New training set size with transformed and rotated images:  360498
In [ ]:
'''### Convert the the images to grayscale ###

print('Converting images to Grayscale...')

train_features = [cv2.cvtColor(train_features[n,:,:,:], cv2.COLOR_BGR2GRAY) 
                for n in range(np.shape(train_features)[0])]
test_features = [cv2.cvtColor(test_features[n,:,:,:], cv2.COLOR_BGR2GRAY) 
                for n in range(np.shape(test_features)[0])]
                
train_features = np.reshape(train_features, (np.shape(train_features)[0],32,32,1))
test_features = np.reshape(test_features, (np.shape(test_features)[0],32,32,1))

print('Grayscale Conversion Complete.')'''
In [ ]:
### Sample grayscale image ###

'''plt.imshow(train_features[200000], cmap='gray')
plt.title('Sample Gray Image')
plt.show()'''
In [ ]:
### Sharpen images to try to optimize features for training ###
'''
print("Sharpening images...")

print("Sharpening images...")
#sharp_image = scipy.misc.imfilter(train_features, 'sharpen')
blurred_image = ndimage.gaussian_filter(train_features, 0)
blurred_image_filter = ndimage.gaussian_filter(blurred_image, 0.2)
alpha = 255
sharpened_image1 = blurred_image + alpha * (blurred_image - blurred_image_filter)
sharpened_image2 = sharpened_image1 + alpha * (sharpened_image1 - blurred_image_filter)
sharpened_image3 = sharpened_image2 + alpha * (sharpened_image2 - blurred_image_filter)
sharpened_image4 = sharpened_image3 + alpha * (sharpened_image3 - blurred_image_filter)
sharpened_image5 = sharpened_image4 + alpha * (sharpened_image4 - blurred_image_filter)
sharpened_image6 = sharpened_image5 + alpha * (sharpened_image5 - blurred_image_filter)

print("Sharpening Complete.")'''
In [10]:
### Normalize training and test features ###

print('Normalizing features...')
train_features = train_features / 255.
test_features = test_features / 255.
print('Normalizing complete.')
Normalizing features...
Normalizing complete.

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [12]:
### Get randomized datasets for training and validation ###

print('Randomizing datasets...')
from sklearn.model_selection import train_test_split
train_features, valid_features, train_labels, valid_labels = train_test_split(
   train_features,
   train_labels,
   test_size=0.2,
   random_state=121
)
print("Randomized datasets complete.")
Randomizing datasets...
Randomized datasets complete.

Question 1

Describe how you preprocessed the data. Why did you choose that technique?

Answer:

After looking at a few of the training examples and plotting the training data I decided to focus on three main aspects for the image preprocessing. The first thing I noticed about the training data was the very uneven class distribution. Only about 7 or 8 classes had a large number of samples (around 2000) and most of the remaining classes had a much lower sample size. I decided to create additional samples for the classes that were underrepresented by rotating the original samples by one degree increments. I didn't want to rotate the images too much because I didn't want to affect the orientation of the road signs which would could harm the training process. After adding the rotated images I decided to create more images by "squeezing" and "stretching" the images using images transforms. The end result was a very even class distribution and a much larger data set (about 9 times larger) then the original 39,209 images.

After I created additional samples I tried to sharpen the images in an attempt to extract more features. This didn't seem to improve the training. I think this might be due to the fact that the images were low resolution. I also tried to convert the images to greyscale before training. The training process was much more efficient but the training/test accuracy was slightly less compared to using the color images. Therefore, I decided not to blur or convert the images to greyscale because both would result in less features for the neural network to train with. I also kept the color images because I think it is very important to recognizing/classifying road signs and I didn't want to remove those features from the training.

The final step in my preprocess stage was to normalize the image pixel values in order to reduce the image variability. This seemed to help quite a bit with making the training more efficient.

Final Preprocessing Method

I decided to preprocess the images with the following two step process:

  • Create additional images by rotating and transforming the original images
  • Normalize image pixel values

Question 2

Describe how you set up the training, validation and testing data for your model. Optional: If you generated additional data, how did you generate the data? Why did you generate the data? What are the differences in the new dataset (with generated data) from the original dataset?

Answer:

I setup the training, validation and testing data sets using a standard cross validation approach where 20% was saved for validation and 80% of the data was saved for training. Also, the original testing data set was unmodified.

I decided to create additional samples by rotating the original images in one degree increments from 10 to -10 degrees and also transforming the images in four different ways. I decided to generate additional data because the original data set had a very uneven distribution of sample images per class. Some classes had around 2000 samples while others only had around 200 samples. The difference between the new data set and the original data set is the overall sample distribution is much larger and more uniform after the rotated/transformed images were added to the original data set.

In [4]:
### Define your architecture here.
### Feel free to use as many code cells as needed.

EPOCHS = 20
BATCH_SIZE = 150


def LeNet(x):    
    # Hyperparameters
    mu = 0
    sigma = 0.1
    
    # Layer 1: Convolutional. Input = 32x32x3. Output = 28x28x6.
    conv1_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 3, 6), mean = mu, stddev = sigma))
    conv1_b = tf.Variable(tf.zeros(6))
    conv1   = tf.nn.conv2d(x, conv1_W, strides=[1, 1, 1, 1], padding='VALID') + conv1_b

    # 1st activation 
    conv1 = tf.nn.relu(conv1)

    # 1st max pooling layer, Input = 28x28x6. Output = 14x14x6.
    conv1 = tf.nn.max_pool(conv1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    # Layer 2: Convolutional. Output = 10x10x16.
    conv2_W = tf.Variable(tf.truncated_normal(shape=(5, 5, 6, 16), mean = mu, stddev = sigma))
    conv2_b = tf.Variable(tf.zeros(16))
    conv2   = tf.nn.conv2d(conv1, conv2_W, strides=[1, 1, 1, 1], padding='VALID') + conv2_b
    
    # 2nd activation.
    conv2 = tf.nn.relu(conv2)

    # 2nd max Pooling. Input = 10x10x16. Output = 5x5x16.
    conv2 = tf.nn.max_pool(conv2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='VALID')

    # Flatten. Input = 5x5x16. Output = 400.
    fc0   = flatten(conv2)
    
    # Layer 3: Fully Connected. Input = 400. Output = 120.
    fc1_W = tf.Variable(tf.truncated_normal(shape=(400, 120), mean = mu, stddev = sigma))
    fc1_b = tf.Variable(tf.zeros(120))
    fc1   = tf.matmul(fc0, fc1_W) + fc1_b
    
    # 3rd activation.
    fc1    = tf.nn.relu(fc1)

    # Layer 4: Fully Connected. Input = 120. Output = 84.
    fc2_W  = tf.Variable(tf.truncated_normal(shape=(120, 84), mean = mu, stddev = sigma))
    fc2_b  = tf.Variable(tf.zeros(84))
    fc2    = tf.matmul(fc1, fc2_W) + fc2_b
    
    # 4th activation.
    fc2    = tf.nn.relu(fc2)

    # Layer 5: Fully Connected. Input = 84. Output = 43.
    fc3_W  = tf.Variable(tf.truncated_normal(shape=(84, 43), mean = mu, stddev = sigma))
    fc3_b  = tf.Variable(tf.zeros(43))
    logits = tf.matmul(fc2, fc3_W) + fc3_b
    
    return logits
In [5]:
### Features and Labels ###

### x is a placeholder for a batch of input images
### y is a placeholder for a batch of labels

x = tf.placeholder(tf.float32, (None, 32, 32, 3))
y = tf.placeholder(tf.int32, (None))
one_hot_y = tf.one_hot(y, 43)

Question 3

What does your final architecture look like? (Type of model, layers, sizes, connectivity, etc.) For reference on how to build a deep neural network using TensorFlow, see Deep Neural Network in TensorFlow from the classroom.

Answer:

My architecture is directly based on the LeNet convolutional neural network from lesson 9. The original LeNet network performed very well after more data was introduced and normalized. The only modifications made were the original input dimensions and final ouput dimension.

Network Layout:

  • First layer: Convolutional layer with a patch size of 5x5, a stride of 1, VALID padding and a depth of 6
  • ReLU activation
  • Max Pooling with a 2x2 kernel and a stride of 2
  • Second layer: Convolutional layer with a patch size of 5x5, a stride of 1, VALID padding and a depth of 6
  • ReLU activation
  • Max Pooling with a 2x2 kernel and a stride of 2
  • Flatten 5x5x16 into 400x1
  • Third layer: Fully connected layer (input=400, output=120)
  • ReLU activation
  • Fourth layer: Fully connected layer (input=120, output=84)
  • ReLU activation
  • Fifth layer: Final fully connected layer with an output of 43 logits (number of classes)
In [6]:
### Train your model here.
### Feel free to use as many code cells as needed.

### Training Pipeline ###

rate = 0.001

logits = LeNet(x)
cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits, one_hot_y)
loss_operation = tf.reduce_mean(cross_entropy)
optimizer = tf.train.AdamOptimizer(learning_rate = rate)
training_operation = optimizer.minimize(loss_operation)
In [7]:
### Model Evaluation ###

correct_prediction = tf.equal(tf.argmax(logits, 1), tf.argmax(one_hot_y, 1))
accuracy_operation = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
saver = tf.train.Saver()

def evaluate(X_data, y_data):
    num_examples = len(X_data)
    total_accuracy = 0
    sess = tf.get_default_session()
    for offset in range(0, num_examples, BATCH_SIZE):
        batch_x, batch_y = X_data[offset:offset+BATCH_SIZE], y_data[offset:offset+BATCH_SIZE]
        accuracy = sess.run(accuracy_operation, feed_dict={x: batch_x, y: batch_y})
        total_accuracy += (accuracy * len(batch_x))
    return total_accuracy / num_examples
In [17]:
### Train Model ###

from sklearn.utils import shuffle
with tf.Session() as sess:
    
    sess.run(tf.global_variables_initializer())
    num_examples = len(train_features)
    
    print("Training...")
    print()
    for i in range(EPOCHS):
        train_features, train_labels = shuffle(train_features, train_labels)
        for offset in range(0, num_examples, BATCH_SIZE):
            end = offset + BATCH_SIZE
            batch_x, batch_y = train_features[offset:end], train_labels[offset:end]
            sess.run(training_operation, feed_dict={x: batch_x, y: batch_y})
            
        validation_accuracy = evaluate(valid_features, valid_labels)
        print("EPOCH {} ...".format(i+1))
        print("Validation Accuracy = {:.3f}".format(validation_accuracy))
        print()
        
    saver.save(sess, '/Users/Sean/CarND-Term1-Starter-Kit/model_4.ckpt')
    print("Model saved")
Training...

EPOCH 1 ...
Validation Accuracy = 0.914

EPOCH 2 ...
Validation Accuracy = 0.955

EPOCH 3 ...
Validation Accuracy = 0.971

EPOCH 4 ...
Validation Accuracy = 0.976

EPOCH 5 ...
Validation Accuracy = 0.982

EPOCH 6 ...
Validation Accuracy = 0.981

EPOCH 7 ...
Validation Accuracy = 0.985

EPOCH 8 ...
Validation Accuracy = 0.989

EPOCH 9 ...
Validation Accuracy = 0.981

EPOCH 10 ...
Validation Accuracy = 0.990

EPOCH 11 ...
Validation Accuracy = 0.990

EPOCH 12 ...
Validation Accuracy = 0.986

EPOCH 13 ...
Validation Accuracy = 0.995

EPOCH 14 ...
Validation Accuracy = 0.990

EPOCH 15 ...
Validation Accuracy = 0.988

EPOCH 16 ...
Validation Accuracy = 0.994

EPOCH 17 ...
Validation Accuracy = 0.994

EPOCH 18 ...
Validation Accuracy = 0.990

EPOCH 19 ...
Validation Accuracy = 0.994

EPOCH 20 ...
Validation Accuracy = 0.995

Model saved
In [18]:
### Evaluate Model on Test data ###

with tf.Session() as session:
    saver.restore(session, '/Users/Sean/CarND-Term1-Starter-Kit/model_4.ckpt')
    print('Model restored with latest weights')

    test_accuracy = evaluate(test_features, test_labels)
    print("Test Accuracy = {:.3f}".format(test_accuracy))
Model restored with latest weights
Test Accuracy = 0.930

Question 4

How did you train your model? (Type of optimizer, batch size, epochs, hyperparameters, etc.)

Answer:

Optimizer: Adam Optimizer

Batch size: 150

Epochs: 20

I tried a few different learning rates but 0.001 seemed to work very well and I didn't have any issues with getting stuck in local minima. I also left the truncated normal distribution mean equal to 0 and the standard deviation equal to 0.1. Bias was always initialized to zero.

I used the trial and error method to train the model and I used my local CPU so I tried to keep the batch size and number of epochs relatively low.

Question 5

What approach did you take in coming up with a solution to this problem? It may have been a process of trial and error, in which case, outline the steps you took to get to the final solution and why you chose those steps. Perhaps your solution involved an already well known implementation or architecture. In this case, discuss why you think this is suitable for the current problem.

Answer:

I began training the data using the LeNet with the intention to try other network architectures but since LeNet seemed to work so well from the beginning I decided to continue to use it.

Most of my time was spent deciding what the best method was to preprocess the data. After the data was finished being preprocessed then I focused on modifying the patch sizes, strides, batch sizes, etc. through the process of trial and error. In the end I decided to use all the original hyperparameters since they seemed to produce the best results.

I believe the LeNet model worked very well in it's original form because the training images were all nicely formated (centered, cropped, close-up) and this made it easy for the LeNet model to classify the images with considerable accuracy. This makes sense since LeNet was originally developed to classify numbers by training on unprocessed numerical images. The model didn't have to search a huge image just to pick out a small traffic sign on the edge of an image.


Step 3: Test a Model on New Images

Take several pictures of traffic signs that you find on the web or around you (at least five), and run them through your classifier on your computer to produce example results. The classifier might not recognize some local signs but it could prove interesting nonetheless.

You may find signnames.csv useful as it contains mappings from the class id (integer) to the actual sign name.

Implementation

Use the code cell (or multiple code cells, if necessary) to implement the first step of your project. Once you have completed your implementation and are satisfied with the results, be sure to thoroughly answer the questions that follow.

In [8]:
### Load the images and plot them here.
### Feel free to use as many code cells as needed.

### Import and plot 10 test images which were collected from the internet ###

test_imgs = np.uint8(np.zeros((10,32,32,3)))

for i in range(1, 11):
    image = io.imread('/Users/Sean/CarND-Term1-Starter-Kit/test_images/pic{}.jpg'.format(str(i)))
    test_imgs[i-1] = image

test_img_data = test_imgs.reshape((10, 32, 32, 3)).astype(np.float32)
new_images = []

for i in range(0, 10):
    print('Test Image: ', i+1)
    new_images.append(test_imgs[i])
    plt.imshow(new_images[i])
    plt.show()
    print(np.shape(new_images))
    #test_imgs = np.reshape(test_imgs, (tf.float32, (None, 32, 32, 3))  
    
Test Image:  1
(1, 32, 32, 3)
Test Image:  2
(2, 32, 32, 3)
Test Image:  3
(3, 32, 32, 3)
Test Image:  4
(4, 32, 32, 3)
Test Image:  5
(5, 32, 32, 3)
Test Image:  6
(6, 32, 32, 3)
Test Image:  7
(7, 32, 32, 3)
Test Image:  8
(8, 32, 32, 3)
Test Image:  9
(9, 32, 32, 3)
Test Image:  10
(10, 32, 32, 3)

Question 6

Choose five candidate images of traffic signs and provide them in the report. Are there any particular qualities of the image(s) that might make classification difficult? It could be helpful to plot the images in the notebook.

In [9]:
f, axarr = plt.subplots(5, 1)
for i in range(5):
    
    axarr[i].imshow(test_imgs[i])
    plt.setp(axarr[i].get_xticklabels(), visible=False)
    plt.setp(axarr[i].get_yticklabels(), visible=False)
plt.show()

Answer:

Image 1: The U.S. "Do Not Enter" sign is slightly different than the German version in the training set so it might not be able to classify it correctly.

Image 2: The "Deer Crossing" sign is shaped diiferent than the German sign in the training set but the sign image is very similar so it may or may not classify it correctly.

Image 3: The "Pedestrian Crossing" sign should not be a problem for the model to classify correctly since it is present in the training set.

Image 4: The image and shape of the U.S. "Slippery Road" sign is very different than the German version, which is in the training set, so the model will probably not classify it correctly.

Image 5: The U.S. "No U-Turn" sign isn't present in the training set so the model will most likely have difficulty classifying it.

Question 7

Is your model able to perform equally well on captured pictures when compared to testing on the dataset? The simplest way to do this check the accuracy of the predictions. For example, if the model predicted 1 out of 5 signs correctly, it's 20% accurate.

NOTE: You could check the accuracy manually by using signnames.csv (same directory). This file has a mapping from the class id (0-42) to the corresponding sign name. So, you could take the class id the model outputs, lookup the name in signnames.csv and see if it matches the sign from the image.

Answer:

I used my model to make predictions for the five captured images pictured above. The resulting model prediction accuracy for the captured images was 20%, however, the model prediction accuracy on the training set was 93%.

Therefore, I believe my model did not perform well in this real world situation. One reason would be because four of the captured images were not present in the training set. These four images were U.S. road signs instead of the German road signs, which is what the model was trained with. The U.S. road signs look very different than the German version of the same sign and the model did not have any prior experience with recognizing the U.S. road sign features.

Another reason that my model did not perform well is because the captured images were not taken from the real world. The captured images were computer generated and the background was blank white. This may have thrown off my model and caused such low prediction accuracy for the captured images.

Results:

Captured Images Model Prediction Accuracy: 20%

Training Dataset Model Prediction Accuracy: 93%

  • Image 1 "No Entry"

    Model Prediction: (17) "No Entry" - Correct Prediction

  • Image 2 "Wild Animals Crossing"

    Model Prediction: (10) "No Passing for Vehicles Over 3.5 Metric Tons" - Incorrect Prediction

  • Image 3 "Pedestrians"

    Model Prediction: (24) "Road Narrows on Right" - Incorrect Prediction

  • Image 4 "Slippery Road"

    Model Prediction: (17) "No Entry" - Incorrect Prediction

  • Image 5 "No U-Turn"

    Model Prediction: (12) "Priority Road" - Incorrect Prediction

In [10]:
import tensorflow as tf
with tf.Session() as sess:
    saver.restore(sess, '/Users/Sean/CarND-Term1-Starter-Kit/model_4.ckpt')
    print('Model restored with latest weights')
    
    ### model evaluation on test images collected from the internet ###
    
    prediction = tf.argmax(logits, 1)

    test_prediction = sess.run(prediction, feed_dict={x: test_img_data})
    print('Test Image Predictions: ', test_prediction)
Model restored with latest weights
Test Image Predictions:  [17 10 24 17 12 14 12 14  9 17]
In [11]:
### Use Softmax and top_K functions to determine prediction probabilites ###

import tensorflow as tf
with tf.Session() as sess:
    saver.restore(sess, '/Users/Sean/CarND-Term1-Starter-Kit/model_4.ckpt')
    print('Model restored with latest weights')
    
    prediction = tf.nn.softmax(logits)
    topFive=tf.nn.top_k(prediction, k=5, sorted=True, name=None)
    top_k_feed_dict = {x: test_img_data}

    print('Softmax/Top_K Results (values and indices): ', sess.run(topFive, feed_dict = top_k_feed_dict))    
    
Model restored with latest weights
Softmax/Top_K Results (values and indices):  TopKV2(values=array([[ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.],
       [ 1.,  0.,  0.,  0.,  0.]], dtype=float32), indices=array([[17,  0,  1,  2,  3],
       [10,  0,  1,  2,  3],
       [24,  0,  1,  2,  3],
       [17,  0,  1,  2,  3],
       [12,  0,  1,  2,  3],
       [14,  0,  1,  2,  3],
       [12,  0,  1,  2,  3],
       [14,  0,  1,  2,  3],
       [ 9,  0,  1,  2,  3],
       [17,  0,  1,  2,  3]]))
In [12]:
### Store Values and Indices ###

with tf.Session() as session:
    saver.restore(session, '/Users/Sean/CarND-Term1-Starter-Kit/model_4.ckpt')
    print('Model restored with latest weights')

    top_k_probabilities_per_image = session.run(topFive, feed_dict=top_k_feed_dict)

    values = np.array([top_k_probabilities_per_image.values])
    indices = np.array([top_k_probabilities_per_image.indices])
Model restored with latest weights
In [13]:
### Visualize the softmax probabilities here ###


def plot_top_k_probabilities(pred_cls, pred_prob, title):
    plt.plot(pred_cls[0], pred_prob[0], 'go')
    x1,x2,y1,y2 = plt.axis()
    plt.ylim(0,1.1)
    plt.xlim(-1, 45)
    plt.ylabel('Probability')
    plt.xlabel('Predicted Class')
    plt.title('Test Image 1 Prediction Certainty')
    plt.show()
    
    plt.plot(pred_cls[1], pred_prob[1], 'go')
    x1,x2,y1,y2 = plt.axis()
    plt.ylim(0,1.1)
    plt.xlim(-1, 45)
    plt.ylabel('Probability')
    plt.xlabel('Predicted Class')
    plt.title('Test Image 2 Prediction Certainty')
    plt.show()
    
    plt.plot(pred_cls[2], pred_prob[2], 'go')
    x1,x2,y1,y2 = plt.axis()
    plt.ylim(0,1.1)
    plt.xlim(-1, 45)
    plt.ylabel('Probability')
    plt.xlabel('Predicted Class')
    plt.title('Test Image 3 Prediction Certainty')
    plt.show()
    
    plt.plot(pred_cls[3], pred_prob[3], 'go')
    x1,x2,y1,y2 = plt.axis()
    plt.ylim(0,1.1)
    plt.xlim(-1, 45)
    plt.ylabel('Probability')
    plt.xlabel('Predicted Class')
    plt.title('Test Image 4 Prediction Certainty')
    plt.show()
    
    plt.plot(pred_cls[4], pred_prob[4], 'go')
    x1,x2,y1,y2 = plt.axis()
    plt.ylim(0,1.1)
    plt.xlim(-1, 45)
    plt.ylabel('Probability')
    plt.xlabel('Predicted Class')
    plt.title('Test Image 5 Prediction Certainty')
    plt.show()

for i in range(len(values)):
    #predicted_class = indices[0][np.argmax(values[0])]
    
    
    correct_class = np.argmax(train_labels[i])
    
    
    plot_top_k_probabilities(indices[i], values[i], '')

Question 8

Use the model's softmax probabilities to visualize the certainty of its predictions, tf.nn.top_k could prove helpful here. Which predictions is the model certain of? Uncertain? If the model was incorrect in its initial prediction, does the correct prediction appear in the top k? (k should be 5 at most)

Answer:

The model is 100% certain of every prediction because I used one hot encoding to predict the traffic sign classes.

The model only predicted the first image correctly. The remaining incorrect predictions did not include the correct class in the top 5 predictions.

Note: Once you have completed all of the code implementations and successfully answered each question above, you may finalize your work by exporting the iPython Notebook as an HTML document. You can do this by using the menu above and navigating to \n", "File -> Download as -> HTML (.html). Include the finished document along with this notebook as your submission.

In [ ]: